12 research outputs found
Deep Active Learning for Named Entity Recognition
Deep learning has yielded state-of-the-art performance on many natural
language processing tasks including named entity recognition (NER). However,
this typically requires large amounts of labeled data. In this work, we
demonstrate that the amount of labeled training data can be drastically reduced
when deep learning is combined with active learning. While active learning is
sample-efficient, it can be computationally expensive since it requires
iterative retraining. To speed this up, we introduce a lightweight architecture
for NER, viz., the CNN-CNN-LSTM model consisting of convolutional character and
word encoders and a long short term memory (LSTM) tag decoder. The model
achieves nearly state-of-the-art performance on standard datasets for the task
while being computationally much more efficient than best performing models. We
carry out incremental active learning, during the training process, and are
able to nearly match state-of-the-art performance with just 25\% of the
original training data
Dense Information Flow for Neural Machine Translation
Recently, neural machine translation has achieved remarkable progress by
introducing well-designed deep neural networks into its encoder-decoder
framework. From the optimization perspective, residual connections are adopted
to improve learning performance for both encoder and decoder in most of these
deep architectures, and advanced attention connections are applied as well.
Inspired by the success of the DenseNet model in computer vision problems, in
this paper, we propose a densely connected NMT architecture (DenseNMT) that is
able to train more efficiently for NMT. The proposed DenseNMT not only allows
dense connection in creating new features for both encoder and decoder, but
also uses the dense attention structure to improve attention quality. Our
experiments on multiple datasets show that DenseNMT structure is more
competitive and efficient
Recommended from our members
Simple, efficient and robust approaches for large scale learning
Robustness of a model plays a vital role in large scale machine learning. Classical estimators in robust statistics do not provide satisfied computational efficiency as data size and model scales. We draw ideas from robust statistics and focus on providing simple and efficient algrithmic paradigms for large scale learning that are provably robust to corrupted training samples. We start from standard supervised and unsupervised problems, and then move towards several semi-supervised settings including mixed linear regression as well as multi-instance multi-label learning. We analyze the algorithms under regular statistical settings with mild assumptions, thus providing theoretical supports for applying the ideas to large scale learning models such as deep neural networks. These simple algorithms serve as strong baselines and have achieved state-of-the-art results on certain tasks. The algorithmic paradigm is applicable to a wide range of problems and our theoretical insights may also guide future research on robust large scale learning.Electrical and Computer Engineerin